NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

TransPeakNet for solvent-aware 2D NMR prediction via multi-task pre-training and unsupervised learning

https://doi.org/10.1038/s42004-025-01455-9

Li, Yunrui; Xu, Hao; Kumar, Ambrish; Wang, Duo-Sheng; Heiss, Christian; Azadi, Parastoo; Hong, Pengyu (December 2025, Communications Chemistry)

Full Text Available
Predicting the stereoselectivity of chemical reactions by composite machine learning method

https://doi.org/10.1038/s41598-024-62158-0

Chung, Jihoon; Li, Justin; Saimon, Amirul Islam; Hong, Pengyu; Kong, Zhenyu (December 2024, Scientific Reports)

Abstract Stereoselective reactions have played a vital role in the emergence of life, evolution, human biology, and medicine. However, for a long time, most industrial and academic efforts followed a trial-and-error approach for asymmetric synthesis in stereoselective reactions. In addition, most previous studies have been qualitatively focused on the influence of steric and electronic effects on stereoselective reactions. Therefore, quantitatively understanding the stereoselectivity of a given chemical reaction is extremely difficult. As proof of principle, this paper develops a novel composite machine learning method for quantitatively predicting the enantioselectivity representing the degree to which one enantiomer is preferentially produced from the reactions. Specifically, machine learning methods that are widely used in data analytics, including Random Forest, Support Vector Regression, and LASSO, are utilized. In addition, the Bayesian optimization and permutation importance tests are provided for an in-depth understanding of reactions and accurate prediction. Finally, the proposed composite method approximates the key features of the available reactions by using Gaussian mixture models, which provide suitable machine learning methods for new reactions. The case studies using the real stereoselective reactions show that the proposed method is effective and provides a solid foundation for further application to other chemical reactions.
more » « less
Full Text Available
Graph Multi-Similarity Learning for Molecular Property Prediction

Xu, Hao; Zhou, Zhengyang; Hong, Pengyu (July 2024, AI4Science Workshop of 41st International Conference on Machine Learning)

Enhancing accurate molecular property predic- tion relies on effective and proficient representa- tion learning. It is crucial to incorporate diverse molecular relationships characterized by multi- similarity (self-similarity and relative similarities) (Wang et al., 2019) between molecules. However, current molecular representation learning meth- ods fall short in exploring multi-similarity and of- ten underestimate the complexity of relationships between molecules. Additionally, previous multi- similarity approaches require the specification of positive and negative pairs to attribute distinct pre- defined weights to different relative similarities, which can introduce potential bias. In this work, we introduce Graph Multi-Similarity Learning for Molecular Property Prediction (GraphMSL) framework, along with a novel approach to for- mulate a generalized multi-similarity metric with- out the need to define positive and negative pairs. In each of the chemical modality spaces (e.g., molecular depiction image, fingerprint, NMR, and SMILES) under consideration, we first de- fine a self-similarity metric (i.e., similarity be- tween an anchor molecule and another molecule), and then transform it into a generalized multi- similarity metric for the anchor through a pair weighting function. GraphMSL validates the effi- cacy of the multi-similarity metric across Molecu- leNet datasets. Furthermore, these metrics of all modalities are integrated into a multimodal multi-similarity metric, which showcases the po- tential to improve the performance. Moreover, the focus of the model can be redirected or cus- tomized by altering the fusion function. Last but not least, GraphMSL proves effective in drug dis- covery evaluations through post-hoc analyses of the learnt representations.
more » « less
Full Text Available
Enhancing Peak Assignment in 13C NMR Spectroscopy: A Novel Approach Using Multimodal Alignment

Xu, Hao; Zhou, Zhengyang; Hong, Pengyu (July 2024, AI4Science Workshop of 41st International Conference on Machine Learning)

Nuclear magnetic resonance (NMR) spectroscopy plays an essential role in deciphering molecular structure and dynamic behaviors. While AI-enhanced NMR prediction models hold promise, challenges still persist in tasks such as molecular retrieval, iso- mer recognition, and peak assignment. In response, this paper introduces a novel solution, Knowledge-Guided Multi-Level Multimodal Alignment with Instance-Wise Discrimination (K-M3 AID), which establishes correspondences between two heterogeneous modalities: molecular graphs and NMR spectra. K- M3AID employs a dual-coordinated contrastive learning architecture with three key modules: a graph-level alignment module, a node-level alignment module, and a communication channel. Notably, K-M3AID introduces knowledge- guided instance-wise discrimination into contrastive learning within the node-level alignment module. In addition, K-M3 AID demonstrates that skills acquired during node-level alignment have a positive impact on graph-level alignment, acknowledging meta-learning as an inherent property. Empirical validation underscores the effectiveness of K-M3AID in multiple zero- shot tasks.
more » « less
Full Text Available
Enhancing Peak Assignment in 13C NMR Spectroscopy - A Novel Approach Using Multimodal Alignment

Xu, Hao; Zhou, Zhengyang; Hong, Pengyu (July 2024, AI4Science Workshop of 41st International Conference on Machine Learning)

Full Text Available
Deep-learning optical flow for measuring velocity fields from experimental data

https://doi.org/10.1039/d4sm00483c

Tran, Phu N; Ray, Sattvic; Lemma, Linnea; Li, Yunrui; Sweeney, Reef; Baskaran, Aparna; Dogic, Zvonimir; Hong, Pengyu; Hagan, Michael F (September 2024, Soft Matter)

Deep learning-based optical flow (DLOF) extracts features in video frames with deep convolutional neural networks to estimate the inter-frame motions of objects. DLOF computes velocity fields more accurately than PIV for densely labeled systems.
more » « less
Full Text Available
A machine learning approach to robustly determine director fields and analyze defects in active nematics

https://doi.org/10.1039/d3sm01253k

Li, Yunrui; Zarei, Zahra; Tran, Phu N.; Wang, Yifei; Baskaran, Aparna; Fraden, Seth; Hagan, Michael F.; Hong, Pengyu (February 2024, Soft Matter)

A machine learning model for reliable director fields calculation from raw experimental images of active nematics. The model is accurate, robust to noise and generalizable, enhancing analysis such as the detection and tracking of topological defects.
more » « less
Full Text Available
Toward Automatic Inference of Glycan Linkages Using MS ⁿ and Machine Learning─Proof of Concept Using Sialic Acid Linkages

https://doi.org/10.1021/jasms.3c00132

Ni, Xinyi; Murray, Nathan B.; Archer-Hartmann, Stephanie; Pepi, Lauren E.; Helm, Richard F.; Azadi, Parastoo; Hong, Pengyu (October 2023, Journal of the American Society for Mass Spectrometry)

Full Text Available
Motif-Based Graph Representation Learning with Application to Chemical Molecules

https://doi.org/10.3390/informatics10010008

Wang, Yifei; Chen, Shiyang; Chen, Guobin; Shurberg, Ethan; Liu, Hang; Hong, Pengyu (March 2023, Informatics)

This work considers the task of representation learning on the attributed relational graph (ARG). Both the nodes and edges in an ARG are associated with attributes/features allowing ARGs to encode rich structural information widely observed in real applications. Existing graph neural networks offer limited ability to capture complex interactions within local structural contexts, which hinders them from taking advantage of the expression power of ARGs. We propose motif convolution module (MCM), a new motif-based graph representation learning technique to better utilize local structural information. The ability to handle continuous edge and node features is one of MCM’s advantages over existing motif-based models. MCM builds a motif vocabulary in an unsupervised way and deploys a novel motif convolution operation to extract the local structural context of individual nodes, which is then used to learn higher level node representations via multilayer perceptron and/or message passing in graph neural networks. When compared with other graph learning approaches to classifying synthetic graphs, our approach is substantially better at capturing structural context. We also demonstrate the performance and explainability advantages of our approach by applying it to several molecular benchmarks.
more » « less
Full Text Available
Knowledgebra: An Algebraic Learning Framework for Knowledge Graph

https://doi.org/10.3390/make4020019

Yang, Tong; Wang, Yifei; Sha, Long; Engelbrecht, Jan; Hong, Pengyu (June 2022, Machine Learning and Knowledge Extraction)

Knowledge graph (KG) representation learning aims to encode entities and relations into dense continuous vector spaces such that knowledge contained in a dataset could be consistently represented. Dense embeddings trained from KG datasets benefit a variety of downstream tasks such as KG completion and link prediction. However, existing KG embedding methods fell short to provide a systematic solution for the global consistency of knowledge representation. We developed a mathematical language for KG based on an observation of their inherent algebraic structure, which we termed as Knowledgebra. By analyzing five distinct algebraic properties, we proved that the semigroup is the most reasonable algebraic structure for the relation embedding of a general knowledge graph. We implemented an instantiation model, SemE, using simple matrix semigroups, which exhibits state-of-the-art performance on standard datasets. Moreover, we proposed a regularization-based method to integrate chain-like logic rules derived from human knowledge into embedding training, which further demonstrates the power of the developed language. As far as we know, by applying abstract algebra in statistical learning, this work develops the first formal language for general knowledge graphs, and also sheds light on the problem of neural-symbolic integration from an algebraic perspective.
more » « less
Full Text Available

« Prev Next »

Search for: All records